Identifying Multi-instance Outliers
نویسندگان
چکیده
This paper studies a new data mining problem called multiinstance outlier identification. This problem arises in tasks where each sample consists of many alternative feature vectors (instances) that describe it. This paper defines the multi-instance outliers and analyzes the basic types of multiinstance outliers. Two general identification approaches are proposed based on the state-of-the-art (single-instance) outlier detector LOF (local outlier factor). One approach utilizes the underlying mechanism of the kernel method and plunges the set distance into LOF to detect the multiinstance outliers. The other approach takes each instance’s neighborhood into account. Based on the two approaches, four concrete multi-instance outlier detectors are then introduced. We conduct experiments over four synthetic data collections and three real-world data collections (two Musk data sets [22, 23] and a hard-drive inspection data set [24]). The experimental results show that the proposed multi-instance outlier detectors are effective while the algorithms that ignore the multi-instance settings perform poorly. Especially, the results on the two Musk sets are consistent with the multi-instance learning results; the results on the hard-drive inspection data set demonstrate that multi-instance outlier identification is promising for real applications.
منابع مشابه
Simultaneous robust estimation of multi-response surfaces in the presence of outliers
A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...
متن کاملTRASMIL: A local anomaly detection framework based on trajectory segmentation and multi-instance learning
Local anomaly detection refers to detecting small anomalies or outliers that exist in some subsegments of events or behaviors. Such local anomalies are easily overlooked by most of the existing approaches since they are designed for detecting global or large anomalies. In this paper, an accurate and flexible threephase framework TRASMIL is proposed for local anomaly detection based on TRAjector...
متن کاملLearning Instance Specific Distance for Multi-Instance Classification
Multi-Instance Learning (MIL) deals with problems where each training example is a bag, and each bag contains a set of instances. Multi-instance representation is useful in many real world applications, because it is able to capture more structural information than traditional flat single-instance representation. However, it also brings new challenges. Specifically, the distance between data ob...
متن کاملSearch for outliers in abnormal data
The most popular methods for identifying outliers are basing on the assumption that the underlying generative model is multi-variate Gaussian with a given set of parameters. However, generally, real observed data are abnormal, thus inference based on the normality assumption may provide very inaccurate results and should not be trusted. I show the search for outliers in two real abnormal data s...
متن کاملDetecting Unusual Input-Output Associations in Multivariate Conditional Data
Despite tremendous progress in outlier detection research in recent years, the majority of existing methods are designed only to detect unconditional outliers that correspond to unusual data patterns expressed in the joint space of all data attributes. Such methods are not applicable when we seek to detect conditional outliers that reflect unusual responses associated with a given context or co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010